This document elucidates some of the remote sensing satellite options for creating a rasterized time series of presence/absence of sediment plumes and algal blooms within the western portion of Lake Superior. This summary is limited to the Landsat constellation, Moderate Resolution Imaging Spectroradiometer (MODIS)/Visible Infrared Imaging Radiometer Suite (VIIRS), and Harmonized Landsat-Sentinel (HLS) as they provide 3 spatial and temporal options for raster development. While these are by no means all of the satellite or satellite products available to use for this purpose, these options will inform our decisions for image processing pipeline development. This document summarizes the benefits and trade offs of each of the satellite-derived data products as well as an estimate of what the deliverable product might look like. We will need your feedback to move forward with development.
Below is the area of interest (AOI, green-shaded area) and observed blooms (yellow pins), dating back to 1995 (data provided by Kate Reinl and should not be made public at this time). The observations have been limited to those within our AOI.
Figure 1. Area of interest and observations of blooms from the Algal Bloom and Nutrient Subgroup of the Lake Superior Partnership. Do not distribute this image publicly at this time.
There is an inherent tradeoff between satellite image overpass frequency and the resolution (pixel size) of the satellite sensors (Table 1). Satellites with more frequent overpasses will have a larger pixel size. Note the differences in data access between sources. HLS data are not yet available on Google Earth Engine (GEE) and require a different acquisition and processing pipeline. Additionally, while VIIRS is listed here, we do not provide a summary of VIIRS data for availability. VIIRS is the ‘next generation’ of the MODIS data and unless we are going to develop the pipeline on MODIS data, we will not use the VIIRS dataset. We would expect VIIRS to have a similar data inventory to MODIS as described later in the text.
| Satellite/Product | Years of Data Availability | Overpass Frequency | Resolution | Data Access |
|---|---|---|---|---|
| Landsat 4, 5, 7, 8, 9 | LS4: 1983-2001 LS5: 1984-2013 LS7: 1999-2021a LS8: 2013-current LS9: 2021-current |
16 days per satellite mission | 30m2b | Google Earth Engine |
| MODIS (Aqua/Terra) | Aqua: 2000-current Terra: 2002-current |
~ 2 days per satellite a processed ‘daily’ product available beginning 2002 |
500m2 | Google Earth Engine |
| VIIRS | 2011-current | daily | 375 & 750m2c | Google Earth Engine |
| Harmonized Landsat-Sentinel Product | 2013-currentd | 2-3 days | 30m2 | LP DAAC |
Satellite images are processed as tiles. Each of the satellite constellations use different grid systems. To get an idea of the satellite tile area relative to the AOI, below are images of each satellite’s tile outline (red) and our AOI (green-shaded polygon).
Figure 2. Landsat tiles over western Lake Superior. Green-shaded area is AOI. Pictured are World Reference System tiles path 26, rows 27 and 28 in red.
Figure 3. MODIS/VIIRS tile over western Lake Superior. Green-shaded area is AOI. AOI falls wholly within the h11v4 MODIS/VIIRS tile.
Figure 4. HLS/Sentinel tiles over western Lake Superior. Green-shaded area is AOI. Tiles pictured are 15TWN, 15TXN, 15TYM, 15TWM, 15TXM, 15TYM, left to right, top to bottom.
Despite the return frequencies of each of the satellite constellations, the availability of images is defined by cloud cover. Given this, we have summarized the availability of images per satellite constellation/product based on the total AOI. We use a different method to aggregate HLS data versus Landsat and MODIS data due to how they are accessed. Additional context:
For HLS, we are assuming that each tile shown in Figure 4 is weighted equally over the AOI and that cloud cover is equally distributed throughout the tile. This is a very conservative estimate of data availability.
For Landsat and MODIS, we were able to process and clip images in GEE to evaluate a more precise value for proportional area available. We have taken this into account in our summaries below by reducing the threshold for inclusion in summary figures.
Note that the AOI is a hand-drawn polygon, so the proportional values presented represent rough estimates. Given that the calculated AOI area includes some land, all estimates of proportion-of-AOI are underestimates to some degree.
All data are limited to the time frame of April 1-Oct 31. In the weekly summaries, this may mean that the first and last week of each year in Figures 5-8 are underestimates of the available images from any given satellite source.
For each of the summary periods below (weekly, monthly), the first figure provides an estimate of the minimum proportion of the AOI that we would be able to rasterize into open water-plume-bloom. The second figure gives an idea of the likelihood that we can ‘fill in the gaps’ from the first figure. The higher the number of images available, the more likely we will be to fill out the AOI with rasterized open water-plume-bloom information.
Given the data available, when summarized to a weekly time series, MODIS is the only satellite with robust enough data to build a consistent output product at this timestep (Figure 5, 6). Ideally we want to have a high ‘maximum proportional area’ (Figure 5) and a high ‘number of images above threshold’ (Figure 6). These data have been limited to the dates of April through October (inclusive).
Figure 5. Indication of maximum proportional area without clouds in a single satellite-derived image of our AOI per week by satellite product. Because of the high return frequency of MODIS, we have more images available to retrieve complete coverage of our AOI, which results in a greater maximum proportional area value (more yellow tiles). Note that the data gap frequency and lenght in HLS coverage begins in 2021.
Here we summarize the number of images available per week that surpass our proportional quantity threshold of at least 40% of our AOI available for Landsat and MODIS or 30% for our estimate for HLS for analysis.
Figure 6. The number of images above described proportional quantity threshold per week per data source. Many of the HLS and Landsat aggregations have only one or two images above the threshold, which would limit our ability to create an open water-plume-bloom raster on the time step of one week, while MODIS often has between 6-10 images to work with in a given week.
When the data are aggregated on a monthly basis, our options for using other satellite-derived images to produce a more consistent raster increases beyond the MODIS data.
Figure 7. Indication of maximum proportional area without clouds in a single satellite-derived image of our AOI per month by satellite product. Aggregating to a monthly time step would allow us to use any of the data products, though HLS- and Landsat-derived rasters are likely to have some spatial gaps given the maximum proportional area.
Figure 8. The number of images above the proportional quantity threshold per week by satellite product. Note the increase in Landsat scenes beginning in 2001 and the frequency of instances where HLS and Landsat have no images above the proportional quantity threshold.
We will have to consider a variety of aggregation techniques and enumerate uncertainties in our final output product. Here are some general notes/thoughts to keep in mind as we move forward:
We assume the data ‘subset’ (i.e. the available images for any time period) is representative of actual conditions throughout the entire time period of aggregation. Because return frequency is static and cloud cover is stochastic, we can probably assert this without anyone getting too upset.
Any area of our AOI that has fewer stacked images compared to other areas during any aggregated time period may have a greater uncertainty associated with their labels. This may be especially true for those areas where we have labeled “open water” values. There is likely some threshold where there is no impact to uncertainty relative to the number of images included in any raster sub-area.
With data dating back to the 1980s, this is the optimal data source for time series over many decades; however, in order to create a reliable and complete raster data set the data need to be aggregated to a monthly time step. Although the pixel resolution is sufficient for identification of floating-mat blooms, we may miss shoulder-season events in April/May and September/October due to limited images above our proportional quantity threshold (Figure 8). The output of a pipeline built on Landsat would likely be a monthly raster of 50-100m pixels from 1985-2022.
MODIS data availability and extent is the most optimal of the three satellite constellations shown here. The major drawback is resolvable pixel size (500-1000 m2). For our particular research question, we may miss blooms completely due to this. The output of a pipeline built on MODIS imagery would likely be a weekly raster of 500m pixels from 2000-2022.
A big part of the novelty of the HLS data acquisition and analysis pipeline is using STAC (the Spatialtemporal Asset Catalog) to request, process, and gather images and we have not yet built out this pipeline beyond taking inventory of the available data. This product shares many of the drawbacks listed in the Landsat section above, though the more frequent gaps in data (represented by lower counts of images in Figure 8) may mean we ‘miss’ bloom events, particularly in 2021-2022 when the data become more limited (which is clear in Figure 5). The output of a pipeline based on HLS data would likely be a monthly raster of a 50-100m resolution at a slightly higher or uncertainty than the Landsat output. The HLS-derived raster would be limited to the time period 2013-2022.
Given the constraints, caveats, and tradeoffs described above, we now need direction on what the raster output should look like to help guide in the creation of a data processing pipeline. This includes feedback on the temporal time step, the optimal raster pixel size, and the initial satellite product to develop the process.
To be thorough, we will likely try to extract images for all of the described satellites above where there are overlapping- and/or neighboring-by-date observations of algal blooms across our AOI (as shown in Figure 1). This will help us investigate how we can incorporate the observed dates of blooms from Kate in our pipeline. If there is insufficient data to inform an algorithm of this nature, we will pivot to using established algorithms for estimating chlorophyll-a from image bandmath. We will be working on this the week prior to your arrival at CSU to help inform our discussions together.
While there are clustering techniques built within GEE, the techniques are limited as well as any ability to enumerate uncertainty. In order to process either STAC- or GEE-acquired image collections with clustering techniques and image segmentation, all pipelines will require hardware for local storage and computing power.